Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM ›› 2007, Vol. 30 ›› Issue (6): 40-45.doi: 10.13190/jbupt.200706.40.028

• Papers • Previous Articles     Next Articles

Collocation Extraction Based on Relative Conditional Entropy

WANG Daliang1, ZHANG Dezheng1, TU Xuyan1, ZHENG Xuefeng1, TONG Zijian2   

  1. (1. School of Information Engineering, University of Science and Technology, Beijing 100083, China;
    2. Department of Research and Development, Sohu.com Inc, Beijing 100084, China)
  • Received:2007-02-04 Revised:2007-03-26 Online:2007-12-31 Published:2007-12-31
  • Contact: WANG Daliang

Abstract:

Previous researches on collocation extraction considered that lexical combination was simply to put terms together, but ignored the collocation preference. To solve that problem, the collocation preference statistic model based on relative conditional entropy is brought up in this paper to measure dependence between headword and co-occurrence words in context. Then the linguistic heuristic rule is integrated to identify the border of collections, by part-of-speech filter and sliding window. Finally, an approach of collocation extraction is formulated. The approach is able to effectively disclose the internal mechanism of collocation and it is more understandable. It is proved the collocation preference strength could be considered as mutual information corrected by directions.

Key words: nature language processing, collocation extraction, relative entropy, collocation preference

CLC Number: